ELexBI, A BASIC TOOL FOR BILINGUAL TERM EXTRACTION FROM SPANISH-BASQUE PARALLEL CORPORA

نویسندگان

  • A. Gurrutxaga
  • X. Saralegi
  • S. Ugartetxea
  • Iñaki Alegria
چکیده

We present the work done by Elhuyar Foundation in the field of bilingual terminology extraction. The aim of this work is to develop some techniques for the automatic extraction of pairs of equivalent terms from Spanish-Basque translation memories, and to implement those techniques in a prototype. Our approach is based on a previous monolingual extraction of term candidates in each language, the creation of candidate bigrams from both segments of the same translation unit, and, finally, the selection of the most likely pair of candidates, based mainly on statistical information (association measures) and cognates. In the first step, we use linguistic techniques for the extraction of term candidates. The result of our work is ELexBI, a prototype tool that can extract equivalent terms from Spanish-Basque translation memories. This work wants to be a contribution to corpusbased bilingual lexicography and terminology in Basque.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computational Lexicography and Lexicology Elexbi, a Basic Tool for Bilingual Term Extraction from Spanish-Basque Parallel Corpora

We present the work done by Elhuyar Foundation in the field of bilingual terminology extraction. The aim ofthis work is to develop some techniques for the automatic extraction ofpairs ofequivalent terms from Spanish-Basque translation memories, and to implement those techniques in a prototype. Our approach is based on a monolingual extraction of term candidates in each language, then the creati...

متن کامل

Named Entities Translation Based On Comparable Corpora

In this paper we present a system for translating named entities from Basque to Spanish based on comparable corpora. For that purpose we have tried two approaches: one based on Basque linguistic features, and a language-independent tool. For both tools we have used BasqueSpanish comparable corpora, a bilingual dictionary and the web as resources.

متن کامل

EACL - 2006 11 th Conference of the European Chapter of the Association for

In this paper we present a system for translating named entities from Basque to Spanish based on comparable corpora. For that purpose we have tried two approaches: one based on Basque linguistic features, and a language-independent tool. For both tools we have used BasqueSpanish comparable corpora, a bilingual dictionary and the web as resources.

متن کامل

Bilingual Terminology Extraction in Sketch Engine

We present a method of bilingual terminology extraction from parallel corpora and a few heuristics and experiments with improving the performance of the basic variant of the method. An evaluation is given using a small gold standard manually prepared for EnglishCzech language pair from DGT translation memory [1]. The bilingual terminology extraction (ABTE3) is available for several languages in...

متن کامل

Learning Spanish-Galician Translation Equivalents Using a Comparable Corpus and a Bilingual Dictionary

So far, research on extraction of translation equivalents from comparable, non-parallel corpora has not been very popular. The main reason was the poor results when compared to those obtained from aligned parallel corpora. The method proposed in this paper, relying on seed patterns generated from external bilingual dictionaries, allows us to achieve similar results to those from parallel corpus...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006